LOG4J2-3068 - JCToolsBlockingQueueFactory can use MpscBlockingConsumerArrayQueue #485

franz1981 · 2021-04-09T05:42:47Z

https://issues.apache.org/jira/browse/LOG4J2-3068

franz1981 · 2021-04-09T05:45:07Z

Bumping JCTools version is necessary due to some recent fixes around one of the used qs.

It's a shame we're exposing a blocking q API there, or we could've used JCTools/JCTools#340 to batch consume from the queue.

jvz

Can you add a manual entry for this as well? With the other options for this plugin in https://github.com/apache/logging-log4j2/blob/master/src/site/asciidoc/manual/appenders.adoc somewhere

vy · 2021-04-09T14:45:39Z

@franz1981, thanks so much for the contribution. Would you mind updating changes.xml too, please?

carterkozak · 2021-04-09T14:57:14Z

Thanks for the contribution @franz1981, your PRs are appreciated as always!

I normally handle the changes.xml pieces when I merge PRs since it can be non-obvious (I've certainly gotten it wrong a few times, and I don't think it makes sense to make contributors learn our system).

carterkozak · 2021-04-09T15:03:20Z

It's a shame we're exposing a blocking q API there, or we could've used JCTools/JCTools#340 to batch consume from the queue.

We could support a lighter weight API with the ability to utilize both BlockingQueue (retaining backwards compatibility) as well as JCTools/JCTools#340, what do you think? Alternatively a separate JCToolsAsyncAppender is an option.

…rArrayQueue

franz1981 · 2021-04-10T14:24:33Z

@jvz

Can you add a manual entry for this as well?

Done!

@carterkozak

Alternatively a separate JCToolsAsyncAppender is an option.

I would go into this direction instead :)
TBH the existing disruptor solution is a better fit in term of features set:

FIFO (I suppose that's mandatory right? If we just need FIFO for the same thread, there are much faster options on JCTools ala https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscCompoundQueue.java#L136
ie using a fixed size of queues to distribute contention on the tail)
Object pool: this is unbeatable
batch-friendly: no idea how much this feature is used on log4j, but it can be rather important while appending files to batch writes syscalls (on FileChannel pwrite).

Thinking about a JCToolsAsyncAppender I still need to understand if being bounded is such a must have: I'm a great fan of backpressure, really, but with logging...is different.
I see different strategies that could be employed to satisfy different application needs and AsyncQueueFullPolicy already expose some of these: I still see a value in using an unbounded q especially if compared with policies that would change the logs order or block the application threads.

If the unbounded q path is feasable, on JCTools we have some option that perform great under heavy contention eg xadd qs.
The nice thing is that a "mild form" of backpressure is still possible with such alternativies by carefully choosing how to replicate the "object pooling" feature of the Disruptor and use the fixed capacity of such pool to limit the backlog on the unbounded q.

carterkozak · 2021-04-11T14:15:01Z

Our fully async disruptor implementation is indeed very difficult to beat in benchmarks, largely for the reasons you've mentioned, also due to the tighter control of the threading model we get by integrating it directly into the logger system (See AsyncLoggerContext, AsyncLogger) rather than applying it to individual appenders (which requires defensive copies of LogEvent objects). This makes it more difficult to like-for-like compare a jctools implementation to the disruptor implementation.

FIFO (I suppose that's mandatory right? If we just need FIFO for the same thread, there are much faster options on JCTools ala https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscCompoundQueue.java#L136
ie using a fixed size of queues to distribute contention on the tail)

Mandatory? That's difficult to say. In the general case I'd say so, however many services push events into complex logging pipelines which handle event ordering based on timestamps or some other identifier, so it doesn't matter so much what order events are produced in (depending on the system, of course!)

Object pool: this is unbeatable

Agreed, this is a big one for us.

batch-friendly: no idea how much this feature is used on log4j, but it can be rather important while appending files to batch writes syscalls (on FileChannel pwrite).

The default configuration doesn't take advantage of this property, however configuring most appenders immediateFlush property allows them to wait until a LogEvent with endOfBatch=true before flushing, in my experience this increases logging throughput by an order of magnitude.

This reminds me, I had meant to research the feasibility of something like libaio for logging file i/o, I recall that you may have some experience with the java bindings? ;-) I imagine eventually io_uring may allow us to batch appends to multiple files into a single syscall, this could be huge as it's common to have some subset of events recorded to multiple files, but I admittedly haven't read as much about how this works in practice as I should.

I still see a value in using an unbounded q especially if compared with policies that would change the logs order or block the application threads.

Thar be dragons! It would be remarkably easy to OOM with an unbounded queue if used incorrectly -- in production systems it's often safer to increase the ringbuffer size and use a discarding policy when the queue is filled, but I don't mind allowing the option for an unbounded queue (on AsyncAppender one can already configure a LinkedBlockingQueue with Integer.MAX_VALUE capacity!). I do like the idea of a queue with a dynamic footprint, at work I use a large constant size ringbuffer which accounts for a large chunk of system memory utilization regardless of load or expectations for the service.

franz1981 · 2021-04-11T15:41:42Z

This reminds me, I had meant to research the feasibility of something like libaio for logging file i/o, I recall that you may have some experience with the java bindings? ;-) I imagine eventually io_uring may allow us to batch appends to multiple files into a single syscall, this could be huge as it's common to have some subset of events recorded to multiple files, but I admittedly haven't read as much about how this works in practice as I should.

libaio need some JNI, as io_uring, but IMO io_uring is to be preferred because more modern. If it worths I've already implemented a version of libaio that minimize the usage of JNI (in favour of Unsafe now and VarHandle/Foreign Access in the future) ie apache/artemis-native#10
and that can be used (with the right changes) as POC to see if it makes sense. Given the nature of being write-only, probably a logger library is a good fit, hence being able to NOT trigger the OS page cache can bring some benefit.

franz1981 · 2021-04-14T11:21:04Z

@carterkozak do we have some benchmark to run this PR vs the baseline and see if it worths?
Sorry to ask directly, but I don't have any decent HW but my laptop at hand in this very moment..

carterkozak · 2021-04-14T23:58:45Z

Of course, I’d be happy to run the benchmarks! Unfortunately I’m not sure I’ll be able to get to it this week with other things going on.

carterkozak · 2021-08-03T21:54:18Z

Sorry for the delay, @franz1981!

This PR rebased to master

java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*"

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4  1454918.276 ± 233834.928  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  2386789.891 ±  26025.458  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  2754800.649 ±  27636.937  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  2172734.114 ±  33748.888  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  2360362.123 ±  54748.437  ops/s

With more threads:
java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*" -t 28

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4    98325.087 ±  14675.006  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  1351814.397 ± 283184.236  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  1337458.477 ± 686440.041  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  1934682.312 ± 157883.496  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  1037054.911 ± 317250.949  ops/s

Master without this PR

java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*"

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4  1464295.244 ± 196064.898  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  2196993.122 ± 732245.757  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  2542884.245 ± 233213.133  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  3173527.805 ± 107428.527  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  2179666.490 ±  83951.441  ops/s

With more threads:
java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*" -t 28

Benchmark                                                                   (configFileName)   Mode  Cnt        Score         Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4   105100.502 ±   17273.746  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  1075800.634 ±  278110.477  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  1517297.622 ±  180276.994  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  3613600.519 ±  102724.103  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  1100745.644 ± 2014574.543  ops/s

Benchmark Implementation

Draft PR with my benchmark changes used for comparison here: #556

ppkarwasz · 2023-03-01T07:07:51Z

This was closed automatically by Github because we renamed the release-2.x branch to 2.x. Feel free to resubmit to the 2.x branch.

jvz requested changes Apr 9, 2021

View reviewed changes

LOG4J2-3068 - JCToolsBlockingQueueFactory can use MpscBlockingConsume…

d67637c

…rArrayQueue

franz1981 force-pushed the LOG4J2-3068 branch from 732ac48 to d67637c Compare April 10, 2021 14:05

jvz added the performance Issues or PRs that affect performance, throughput, latency, etc. label Jan 25, 2023

vy deleted the branch apache:master February 28, 2023 15:02

vy closed this Feb 28, 2023

Uh oh!

LOG4J2-3068 - JCToolsBlockingQueueFactory can use MpscBlockingConsumerArrayQueue #485

LOG4J2-3068 - JCToolsBlockingQueueFactory can use MpscBlockingConsumerArrayQueue #485

Uh oh!

Conversation

franz1981 commented Apr 9, 2021

Uh oh!

franz1981 commented Apr 9, 2021

Uh oh!

jvz left a comment

Choose a reason for hiding this comment

Uh oh!

vy commented Apr 9, 2021

Uh oh!

carterkozak commented Apr 9, 2021

Uh oh!

carterkozak commented Apr 9, 2021

Uh oh!

franz1981 commented Apr 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carterkozak commented Apr 11, 2021

Uh oh!

franz1981 commented Apr 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franz1981 commented Apr 14, 2021

Uh oh!

carterkozak commented Apr 14, 2021

Uh oh!

carterkozak commented Aug 3, 2021

This PR rebased to master

Master without this PR

Benchmark Implementation

Uh oh!

ppkarwasz commented Mar 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

franz1981 commented Apr 10, 2021 •

edited

Loading

franz1981 commented Apr 11, 2021 •

edited

Loading